26/06/2018

Welcome, and overview

First session:

  • Not heaps of paleo-specific R
  • But building blocks to make you an expeRt
  • Things that go into R (data inputs)
  • How to structure your data inputs and outputs
  • Getting started in R

Welcome, and overview

Second session:

  • Data validation
  • Data visualisation
  • NMDS
  • RDA
  • Plotting NMDS etc for publications
  • Saving and export

BD (before data): project structures

  1. Raw data (as entered)
  2. Corrected and modified data

BD (before data): project structures

Keep a record of how you went from (1) to (2) - even if you don't do it in R

  1. Correct/modify data in R (with reminders)
  2. Create a new spreadsheet and keep a .txt records

Project structures

Where should a project live?

Pros and cons of the following:

  • MW-LCR shared drives
  • Dropbox (C:/)
  • Github (C:/)
  • MW-LCR personal drive

Where should a project live

githubScreenshot

githubScreenshot

Within the project: naming files

Machine readable

  • no punctuation symbols
  • no spaces
  • be careful with capitals
  • for data, easy to parse

Machine readable

  • e.g. year_site_coreNUM_type
  • e.g. 2018-06-31_eweburn_X18-062_concentrations.csv
  • e.g. 2018-06-31_eweburn_X18-062_age-depth.csv
  • e.g. 2018-06-31_eweburn_X18-062_species-dictionary.csv

note that we separate units of metadata with a "_" and within units, with a "-".

Human readable

  • This applies to scripts and data

  • e.g. 1_data-cleaning-vegetation.R
  • e.g. 2_data-cleaning-species-dictionary.R
  • e.g. function_plot-all-species.R
  • e.g. function_clean-italics-tilia.R

Group discussion - data & spreadsheets

example data screenshot

example data screenshot

Booting up the R

rstudio

rstudio

The basics (1)

getwd()
## [1] "/Users/oliviaburge/Documents/paleo-R-workshop/1-folders-spreadsheets-organisingData"

The basics (2)

setwd()

The basics (3)

sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.5
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] DiagrammeR_1.0.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.17       highr_0.6          pillar_1.2.1      
##  [4] compiler_3.5.0     RColorBrewer_1.1-2 influenceR_0.1.0  
##  [7] plyr_1.8.4         bindr_0.1.1        viridis_0.5.1     
## [10] tools_3.5.0        digest_0.6.15      jsonlite_1.5      
## [13] viridisLite_0.3.0  gtable_0.2.0       evaluate_0.10.1   
## [16] tibble_1.4.2       rgexf_0.15.3       pkgconfig_2.0.1   
## [19] rlang_0.2.1        igraph_1.2.1       rstudioapi_0.7    
## [22] yaml_2.1.19        bindrcpp_0.2.2     gridExtra_2.3     
## [25] downloader_0.4     dplyr_0.7.5        stringr_1.3.1     
## [28] knitr_1.20         htmlwidgets_1.2    hms_0.4.2         
## [31] grid_3.5.0         rprojroot_1.3-2    tidyselect_0.2.4  
## [34] glue_1.2.0         R6_2.2.2           Rook_1.1-1        
## [37] XML_3.98-1.11      rmarkdown_1.10     ggplot2_2.2.1.9000
## [40] tidyr_0.8.0        purrr_0.2.4        readr_1.1.1       
## [43] magrittr_1.5       backports_1.1.2    scales_0.5.0      
## [46] htmltools_0.3.6    assertthat_0.2.0   colorspace_1.3-2  
## [49] brew_1.0-6         stringi_1.2.3      visNetwork_2.0.3  
## [52] lazyeval_0.2.1     munsell_0.4.3

The basics (4)

require(tidyverse)
## Loading required package: tidyverse
## ── Attaching packages ────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000     ✔ purrr   0.2.4     
## ✔ tibble  1.4.2          ✔ dplyr   0.7.5     
## ✔ tidyr   0.8.0          ✔ stringr 1.3.1     
## ✔ readr   1.1.1          ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Example data

# install.packages("vegan")
# install.packages("skimr")
require(vegan)
require(skimr)
data("mite")
data("mite.env")

Viewing data

head(mite, n = 6)
Brachy PHTH HPAV RARD SSTR Protopl MEGR MPRO TVIE HMIN HMIN2 NPRA TVEL ONOV SUCT LCIL Oribatl1 Ceratoz1 PWIL Galumna1 Stgncrs2 HRUF Trhypch1 PPEL NCOR SLAT FSET Lepidzts Eupelops Miniglmn LRUG PLAG2 Ceratoz3 Oppiminu Trimalc2
17 5 5 3 2 1 4 2 2 1 4 1 17 4 9 50 3 1 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 7 16 0 6 0 4 2 0 0 1 3 21 27 12 138 6 0 1 3 9 1 1 1 2 2 2 1 0 0 0 0 0 0 0
4 3 1 1 2 0 3 0 0 0 6 3 20 17 10 89 3 0 2 1 8 0 3 0 2 0 8 0 0 0 0 0 0 0 0
23 7 10 2 2 0 4 0 1 2 10 0 18 47 17 108 10 1 0 1 2 1 2 1 3 2 12 0 0 0 0 0 0 0 0
5 8 13 9 0 13 0 0 0 3 14 3 32 43 27 5 1 0 5 2 1 0 1 0 0 0 12 2 0 0 0 0 0 0 0
19 7 5 9 3 2 3 0 0 20 16 2 13 38 39 3 5 0 1 1 8 0 4 0 1 0 10 0 0 0 0 0 0 0 0

Viewing data

## Skim summary statistics
##  n obs: 70 
##  n variables: 35 
## 
## ── Variable type:integer ──────────────────────────────────────────────────────────────────
##  variable missing complete  n  mean    sd p0  p25  p50   p75 p100     hist
##    Brachy       0       70 70  8.73 10.08  0 3     4.5 11.75   42 ▇▂▁▂▁▁▁▁
##  Ceratoz1       0       70 70  1.29  1.46  0 0     1    2       5 ▇▆▁▃▁▁▁▁
##  Ceratoz3       0       70 70  1.3   2.2   0 0     0    2       9 ▇▁▁▁▁▁▁▁
##  Eupelops       0       70 70  0.64  0.99  0 0     0    1       4 ▇▃▁▁▁▁▁▁
##      FSET       0       70 70  1.86  3.18  0 0     0    2      12 ▇▂▁▁▁▁▁▁
##  Galumna1       0       70 70  0.96  1.73  0 0     0    1       8 ▇▁▁▁▁▁▁▁
##      HMIN       0       70 70  4.91  8.47  0 0     0    4.75   36 ▇▁▁▁▁▁▁▁
##     HMIN2       0       70 70  1.96  3.92  0 0     0    2.75   20 ▇▂▁▁▁▁▁▁
##      HPAV       0       70 70  8.51  7.56  0 4     6.5 12      37 ▇▇▃▃▁▁▁▁
##      HRUF       0       70 70  0.23  0.62  0 0     0    0       3 ▇▁▁▁▁▁▁▁
##      LCIL       0       70 70 35.26 88.85  0 1.25 13   44     723 ▇▁▁▁▁▁▁▁
##  Lepidzts       0       70 70  0.17  0.54  0 0     0    0       3 ▇▁▁▁▁▁▁▁
##      LRUG       0       70 70 10.43 12.66  0 0     4.5 17.75   57 ▇▂▂▁▁▁▁▁
##      MEGR       0       70 70  2.19  3.62  0 0     1    3      17 ▇▂▁▁▁▁▁▁
##  Miniglmn       0       70 70  0.24  0.79  0 0     0    0       5 ▇▁▁▁▁▁▁▁
##      MPRO       0       70 70  0.16  0.47  0 0     0    0       2 ▇▁▁▁▁▁▁▁
##      NCOR       0       70 70  1.13  1.65  0 0     0.5  1.75    7 ▇▃▂▂▁▁▁▁
##      NPRA       0       70 70  1.89  2.37  0 0     1    2.75   10 ▇▂▂▁▁▁▁▁
##      ONOV       0       70 70 17.27 18.05  0 5    10.5 24.25   73 ▇▃▂▁▁▁▁▁
##  Oppiminu       0       70 70  1.11  1.84  0 0     0    1.75    9 ▇▁▁▁▁▁▁▁
##  Oribatl1       0       70 70  1.89  3.43  0 0     0    2.75   17 ▇▁▁▁▁▁▁▁
##      PHTH       0       70 70  1.27  2.17  0 0     0    2       8 ▇▁▁▁▁▁▁▁
##     PLAG2       0       70 70  0.8   1.79  0 0     0    1       9 ▇▁▁▁▁▁▁▁
##      PPEL       0       70 70  0.17  0.54  0 0     0    0       3 ▇▁▁▁▁▁▁▁
##   Protopl       0       70 70  0.37  1.61  0 0     0    0      13 ▇▁▁▁▁▁▁▁
##      PWIL       0       70 70  1.09  1.71  0 0     0    1       8 ▇▁▁▁▁▁▁▁
##      RARD       0       70 70  1.21  2.78  0 0     0    1      13 ▇▂▁▁▁▁▁▁
##      SLAT       0       70 70  0.4   1.23  0 0     0    0       8 ▇▁▁▁▁▁▁▁
##      SSTR       0       70 70  0.31  0.97  0 0     0    0       6 ▇▁▁▁▁▁▁▁
##  Stgncrs2       0       70 70  0.73  1.83  0 0     0    0       9 ▇▁▁▁▁▁▁▁
##      SUCT       0       70 70 16.96 13.89  0 7.25 13.5 24      63 ▇▇▆▅▂▁▁▁
##  Trhypch1       0       70 70  2.61  6.14  0 0     0    2      29 ▇▁▁▁▁▁▁▁
##  Trimalc2       0       70 70  2.07  5.79  0 0     0    0      33 ▇▁▁▁▁▁▁▁
##      TVEL       0       70 70  9.06 10.93  0 0     3   19      42 ▇▁▁▂▁▁▁▁
##      TVIE       0       70 70  0.83  1.47  0 0     0    1       7 ▇▁▁▁▁▁▁▁

Live coding demo

  • Select some columns
  • Filter just some observations
  • Histogram of one column

Select - concept

Select chooses certain columns - to keep, or to get rid of. The format is

DATANAME %>% select(col1, col2, col3)

Select - examples

mite %>% 
  select(Brachy, PHTH, HPAV)
##    Brachy PHTH HPAV
## 1      17    5    5
## 2       2    7   16
## 3       4    3    1
## 4      23    7   10
## 5       5    8   13
## 6      19    7    5
## 7      17    3    8
## 8       5    4    8
## 9       3    3    2
## 10     22    4    5
## 11     36    7   35
## 12     28    2   12
## 13      3    2    4
## 14     41    5   12
## 15      6    0    6
## 16      7    2    3
## 17      9    0    1
## 18     19    3    7
## 19     12    2   10
## 20      3    1    7
## 21      5    2    8
## 22      4    0    4
## 23     19    0    8
## 24      4    0    1
## 25     12    4   15
## 26      6    0    4
## 27      4    4    4
## 28      9    0    4
## 29     42    0    6
## 30     20    1    2
## 31     12    0    5
## 32      4    0    9
## 33     38    0   17
## 34      5    0   14
## 35      3    0    0
## 36      3    1    2
## 37      3    0    5
## 38      8    0    6
## 39      0    0    0
## 40      1    0   31
## 41      2    0   10
## 42      0    0   12
## 43      5    0    2
## 44      0    0    2
## 45     11    0    8
## 46      4    0    4
## 47      0    0    8
## 48      0    0    3
## 49     10    0   14
## 50      4    0   37
## 51      2    0    5
## 52      3    0    4
## 53      3    0   17
## 54      2    0    7
## 55      1    0    3
## 56      1    0   16
## 57      0    0    0
## 58      0    0   12
## 59      1    0    0
## 60      1    0   16
## 61      6    0    9
## 62      3    0    5
## 63     19    0    3
## 64      3    0   16
## 65      4    0   10
## 66      8    0   18
## 67      4    0    3
## 68      6    0   22
## 69     20    2    4
## 70      5    0   11

Select - examples

names(mite)
##  [1] "Brachy"   "PHTH"     "HPAV"     "RARD"     "SSTR"     "Protopl" 
##  [7] "MEGR"     "MPRO"     "TVIE"     "HMIN"     "HMIN2"    "NPRA"    
## [13] "TVEL"     "ONOV"     "SUCT"     "LCIL"     "Oribatl1" "Ceratoz1"
## [19] "PWIL"     "Galumna1" "Stgncrs2" "HRUF"     "Trhypch1" "PPEL"    
## [25] "NCOR"     "SLAT"     "FSET"     "Lepidzts" "Eupelops" "Miniglmn"
## [31] "LRUG"     "PLAG2"    "Ceratoz3" "Oppiminu" "Trimalc2"
mite %>% 
  select(-c(PHTH:Oppiminu))
##    Brachy Trimalc2
## 1      17        0
## 2       2        0
## 3       4        0
## 4      23        0
## 5       5        0
## 6      19        0
## 7      17        0
## 8       5        0
## 9       3        0
## 10     22        0
## 11     36        0
## 12     28        0
## 13      3        0
## 14     41        0
## 15      6        0
## 16      7        0
## 17      9        0
## 18     19        0
## 19     12        0
## 20      3        0
## 21      5        0
## 22      4        0
## 23     19        0
## 24      4        0
## 25     12        0
## 26      6        0
## 27      4        0
## 28      9        0
## 29     42        0
## 30     20        0
## 31     12        0
## 32      4        0
## 33     38        0
## 34      5        0
## 35      3        0
## 36      3        0
## 37      3        0
## 38      8        0
## 39      0        0
## 40      1        0
## 41      2        0
## 42      0        0
## 43      5        1
## 44      0        0
## 45     11        0
## 46      4        0
## 47      0        0
## 48      0        1
## 49     10        0
## 50      4        0
## 51      2        1
## 52      3        0
## 53      3        9
## 54      2        1
## 55      1        0
## 56      1        5
## 57      0        0
## 58      0        0
## 59      1        1
## 60      1        1
## 61      6        5
## 62      3        0
## 63     19        8
## 64      3       11
## 65      4       25
## 66      8        9
## 67      4       33
## 68      6       17
## 69     20        3
## 70      5       14

What is this? %>% %>% %>%

What is this? %>% %>% %>%

Chaining allows us to write code in the order we want it done. Otherwise, it must be wrapped in brackets with the first thing to be done right in the middle.

mite.env %>% select(Shrub, Topo)

means, take the mite.env dataframe, and then select the columns Shrub and Topo.

Filter - concept

Filter selects rows in your dataframe, based on the conditions you specify. Same format as for select():

filter(DATA, CONDITION1)

  • row only selected if condition satisfied

Filter - concept

filter(DATA, CONDITION1 & CONDITION2)

  • both conditions must be satisfied

filter(DATA, CONDITION1 | CONDITION2)

  • one or both conditions must be satisfied

Filter - example

mite.env %>% filter(WatrCont > 650)
##   SubsDens WatrCont Substrate Shrub    Topo
## 1    64.75   691.79   Sphagn2   Few Blanket
## 2    62.38   708.16  Barepeat   Few Blanket
## 3    52.73   656.35   Sphagn1  None Blanket
## 4    52.12   826.96   Sphagn1  None Blanket

Filter - example with two conditions

Here, Shrub has to equal "Few". If you want to select two values (such as two sites) see the next slide.

mite.env %>% filter(WatrCont > 650 & Shrub == "Few")
##   SubsDens WatrCont Substrate Shrub    Topo
## 1    64.75   691.79   Sphagn2   Few Blanket
## 2    62.38   708.16  Barepeat   Few Blanket

Filter - example selecting > 1 categorical element

unique(mite.env$Substrate)
## [1] Sphagn1   Litter    Interface Sphagn3   Sphagn4   Sphagn2   Barepeat 
## Levels: Sphagn1 Sphagn2 Sphagn3 Sphagn4 Litter Barepeat Interface
mite.env %>% filter(Substrate %in% c("Litter", "Barepeat", "Interface"))
##    SubsDens WatrCont Substrate Shrub    Topo
## 1     54.99   434.81    Litter   Few Hummock
## 2     46.07   371.72 Interface   Few Hummock
## 3     80.59   266.78 Interface  Many Blanket
## 4     61.43   310.70    Litter  Many Blanket
## 5     37.25   239.51 Interface  Many Blanket
## 6     59.93   350.64 Interface  Many Blanket
## 7     35.41   321.87 Interface   Few Hummock
## 8     29.56   296.95 Interface  Many Hummock
## 9     44.10   383.83 Interface  Many Blanket
## 10    38.61   145.68 Interface  Many Hummock
## 11    32.27   291.59 Interface  Many Hummock
## 12    35.30   293.49 Interface  Many Blanket
## 13    32.86   323.12 Interface  Many Hummock
## 14    37.33   284.27 Interface  Many Blanket
## 15    53.17   367.11 Interface  Many Blanket
## 16    34.76   393.62 Interface   Few Blanket
## 17    47.74   528.44 Interface   Few Blanket
## 18    34.26   398.20 Interface   Few Blanket
## 19    26.60   386.37 Interface   Few Blanket
## 20    56.65   581.00 Interface   Few Blanket
## 21    62.38   708.16  Barepeat   Few Blanket
## 22    46.81   538.51 Interface   Few Blanket
## 23    33.98   323.96 Interface   Few Blanket
## 24    28.29   434.28 Interface  None Blanket
## 25    26.83   414.65 Interface  None Blanket
## 26    31.98   447.65 Interface  None Blanket
## 27    41.38   532.88 Interface  None Blanket
## 28    56.82   613.39  Barepeat  None Blanket
## 29    47.03   626.36 Interface  None Blanket
## 30    48.59   634.75 Interface  None Blanket
## 31    35.03   482.27 Interface  None Blanket

Histogram: concept

  • We use the command ggplot which comes from the ggplot2 package.
  • ggplot() just initiates the plot
  • then we tell it to draw a histogram with the geom_histogram() part.

ggplot(data = DATAFRAME, aes(x = COLUMN_FOR_HISTOGRAM)) + geom_histogram()

First example plot: histogram

ggplot(data = mite.env, aes(x = SubsDens)) +
  geom_histogram() 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

First example plot: histogram

ggplot(data = mite.env, aes(x = SubsDens)) +
  geom_histogram(binwidth = 10) 

Reading in real data!

Actually, it should be real, but it should also be tidy

Reading in real data!

  • Where is the file
  • How do we tell R to get there
    • This also depends where R has the working directory
  • We need to tell R how to get from the working directory, to the file that we want to read in!

getwd()

setwd()

What happens with untidy data?

aMess <- read.csv("data/messyDataExample.csv")

head(aMess)
##   Ashburton.Lakes.weight.of.vegetation.harvest.subsample           X X.1
## 1                                                                       
## 2                                                  Date:                
## 3                                              Lab team:                
## 4                                                                       
## 5                                                                       
## 6                                                        Wet weight     
##            X.2          X.3          X.4                    X.5 X.6
## 1                                                                  
## 2                                                                  
## 3                                                                  
## 4                                                                  
## 5                                                                  
## 6 Wet weight 1 Wet weight 2 Wet weight 3 Average wet weight (g)    
##           X.7          X.8          X.9         X.10
## 1                                                   
## 2                                                   
## 3                                                   
## 4                                                   
## 5                                                   
## 6 Dry weight  Dry weight 1 Dry weight 2 Dry weight 3
##                     X.11 X.12 X.13 X.14 X.15 X.16 X.17 X.18 X.19 X.20 X.21
## 1                                    NA                                   
## 2                                    NA                                   
## 3                                    NA                                   
## 4                                    NA                                   
## 5                                    NA                                   
## 6 Average dry weight (g)             NA                                   
##   X.22 X.23 X.24 X.25 X.26 X.27 X.28 X.29 X.30 X.31 X.32
## 1             NA                                        
## 2             NA                                        
## 3             NA                                        
## 4             NA                                        
## 5             NA                                        
## 6             NA

What happens with untidy data?

  • To see the whole thing: View(aMess)

  • Compare the output of names(aMess) and names(mite.env)

Let's fix it, and read it back in

[group task]

Read in tidied data

  • Where is it
  • Where is the working directory?
  • Where is the file in relation to our working directory?
  • What did we call it?!

  • Then we can read it back in